<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: How to run s6-svscan as process 1</title>
    <meta name="Description" content="s6: s6-svscan as init" />
    <meta name="Keywords" content="s6 supervision svscan s6-svscan init process boot 1" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> How to run s6-svscan as process 1 </h1>

<p>
 <em> Since 2015-06-17, if you're a Linux user, you can use the
<a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
package to help you do so! Please read this documentation page first,
though, it will help you understand what s6-linux-init does. </em>
</p>

<p>
 It is possible to run s6-svscan as process 1, i.e. the <tt>init</tt>
process. However, that does not mean you can directly <em>boot</em>
on s6-svscan; that little program cannot do everything
your stock init does. Replacing the <tt>init</tt> process requires a
bit of understanding of what is going on.
</p>

<a name="stages">
<h2> The three stages of init </h2>
</a>

<p>
<small> Okay, it's actually four, but the fourth stage is an implementation
detail that users don't care about, so we'll stick with three. </small>
</p>

<p>
 The life of a Unix machine has three stages.
 <small>Yes, three.</small>
</p>

<ol>
 <li> The <em>early initialization</em> phase. It starts when the
kernel launches the first userland process, traditionally called <tt>init</tt>.
During this phase, init is the only lasting process; its duty is to
prepare the machine for the start of <em>other</em> long-lived processes,
i.e. services. Work such as mounting filesystems, setting the system clock,
etc. can be done at this point. This phase ends when process 1 launches
its first services. </li>
 <li> The <em>cruising</em> phase. This is the "normal", stable state of an
up and running Unix machine. Early work is done, and init launches and
maintains <em>services</em>, i.e. long-lived processes such as gettys,
the ssh server, and so on. During this phase, init's duties are to reap
orphaned zombies and to supervise services - also allowing the administrator
to add or remove services. This phase ends when the administrator
requires a shutdown. </li>
 <li> The <em>shutdown</em> phase. Everything is cleaned up, services are
stopped, filesystems are unmounted, the machine is getting ready to be
halted. At the end of this phase, all processes are killed, first with
a SIGTERM, then with a SIGKILL (to catch processes that resist SIGTERM).
The only processes that survive it are process 1; if this process is
<a href="s6-svscan.html">s6-svscan</a> and its <a href="scandir.html">scandir</a>
is not empty, then the supervision tree is restarted. </li>
 <li> The <em>hardware shutdown</em> phase. The system clock is stored,
filesystems are unmounted, and the system call that reboots the machine or
powers it off is called. </li>
</ol>

<p>
<small> Unless you're implementing a shutdown procedure over a supervision
tree, you can absolutely consider that the hardware shutdown is part of stage 3. </small>
</p>

<p>
 As you can see, process 1's duties are <em>radically different</em> from
one stage to the next, and init has the most work when the machine
is booting or shutting down, which means a normally negligible fraction
of the time it is up. The only common thing is that at no point is process
1 allowed to exit.
</p>

<p>
 Still, all common init systems insist that the same <tt>init</tt>
executable must handle these three stages. From System V init to launchd,
via busybox init, you name it - one init program from bootup to shutdown.
No wonder those programs, even basic ones, seem complex to write and
complex to understand!
</p>

<p>
Even the <a href="http://smarden.org/runit/runit.8.html">runit</a>
program, designed with supervision in mind, remains as process 1 all the
time; at least runit makes things simple by clearly separating the three
stages and delegating every stage's work to a different script that is
<em>not</em> run as process 1. (Since runit does not distinguish between
stage 3 and stage 4, it needs very careful handling of the
<tt>kill -9 -1</tt> part of stage 3: getting <tt>/etc/runit/3</tt> killed
before it unmounts the filesystems would be bad.)
</p>

<p>
 One init to rule them all?
<a href="https://en.wikipedia.org/wiki/Porgy_and_Bess">It ain't necessarily so!</a>
</p>

<a name="stage2">
<h2> The role of s6-svscan </h2>
</a>

<p>
 init does not have the right to die, but fortunately, <em>it has the right
to <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.html">execve()</a>!</em>
During stage 2, why use precious RAM, or at best, swap space, to store data
that are only relevant to stages 1 or 3-4? It only makes sense to have an
init process that handles stage 1, then executes into an init process that
handles stage 2, and when told to shutdown, this "stage 2" init executes into
a "stage 3" init which just performs shutdown. Just as runit does with the
<tt>/etc/runit/[123]</tt> scripts, but exec'ing the scripts as process 1
instead of forking them.
</p>

<p>
It becomes clear now that
<a href="s6-svscan.html">s6-svscan</a> is perfectly suited to
exactly fulfill process 1's role <strong>during stage 2</strong>.
</p>

<ul>
 <li> It does not die </li>
 <li> The reaper takes care of every zombie on the system </li>
 <li> The scanner maintains services alive </li>
 <li> It can be sent commands via the <a href="s6-svscanctl.html">s6-svscanctl</a>
interface </li>
 <li> It execs into a given script when told to </li>
</ul>

<p>
 However, an init process for stage 1 and another one for stage 3 are still
needed. Fortunately, those processes are very easy to design! The only
difficulty here is that they're heavily system-dependent, so it's not possible
to provide a stage 1 init and a stage 3 init that will work everywhere.
s6 was designed to be as portable as possible, and it should run on virtually
every Unix platform; but outside of stage 2 is where portability stops.
</p>

<p>
 The <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
package provides a tool, <tt>s6-linux-init-maker</tt>, to automatically
create a suitable stage 1 init (so, the <tt>/sbin/init</tt> binary) for
Linux.
It is also possible to write similar tools for other operating systems,
but the details are heavily system-dependent.
</p>

<p>
 For the adventurous and people who need to do this by hand, though, here are
are some general design tips.
</p>

<a name="stage1">
<h2> How to design a stage 1 init </h2>
</a>

<h3> What stage 1 init must do </h3>

<ul>
 <li> Prepare an initial <a href="scandir.html">scan directory</a>, say in
<tt>/run/service</tt>, with a few vital services, such as s6-svscan's own logger,
and an early getty (in case debugging is needed). That implies mounting a
read-write filesystem, creating it in RAM if needed, if the root filesystem
is read-only. </li>
 <li> Either perform all the one-time initialization, as stage 1
<a href="http://smarden.org/runit/">runit</a> does; </li>
 <li> or fork a process that will perform most of the one-time initialization
once s6-svscan is in charge. </li>
 <li> Be extremely simple and not fail, because recovery is almost impossible
here. </li>
</ul>

<p>
 Unlike the <tt>/etc/runit/1</tt> script, an init-stage1 script running as
process 1 has nothing to back it up, and if it fails and dies, the machine
crashes. Does that mean the runit approach is better? It's certainly safer,
but not necessarily better, because init-stage1 can be made <em>extremely
small</em>, to the point it is practically failproof, and if it fails, it
means something is so wrong that you
would have had to reboot the machine with <tt>init=/bin/sh</tt> anyway.
</p>

<p>
 To make init-stage1 as small as possible, only this realization is needed:
you do not need to perform all of the one-time initialization tasks before
launching s6-svscan. Actually, once init-stage1 has made it possible for
s6-svscan to run, it can fork a background "init-stage2" process and exec
into s6-svscan immediately! The "init-stage2" process can then pursue the
one-time initialization, with a big advantage over the "init-stage1"
process: s6-svscan is running, as well as a few vital services, and if
something bad happens, there's a getty for the administrator to log on.
No need to play fancy tricks with <tt>/dev/console</tt> anymore! Yes,
the theoretical separation in 3 stages is a bit more flexible in practice:
the "stage 2" process 1 can be already running when a part of the
"stage 1" one-time tasks are still being run.
</p>

<p>
 Of course, that means that the scan directory is still incomplete when
s6-svscan first starts, because most services can't yet be run, for
lack of mounted filesystems, network etc. The "init-stage2" one-time
initialization script must populate the scan directory when it has made
it possible for all wanted services to run, and trigger the scanner.
Once all the one-time tasks are done, the scan directory is fully
populated and the scanner has been triggered, the machine is fully
operational and in stage 2, and the "init-stage2" script can die.
</p>

<h3> Is it possible to write stage 1 init in a scripting language? </h3>

<p>
 It is very possible, and if you are attempting to write your own stage 1,
I definitely recommend it. If you are using
s6-svscan as stage 2 init, stage 1 init should be simple enough
that it can be written in any scripting language you want, just
as <tt>/etc/runit/1</tt> is if you're using runit. And since it
should be so small, the performance impact will be negligible,
while maintainability is enhanced. Definitely make your stage 1
init a script.
</p>

<p>
 Of course, most people will use the <em>shell</em> as scripting
language; however, I advocate the use of
<a href="//skarnet.org/software/execline/">execline</a>
for this, and not only for the obvious reasons. Piping s6-svscan's
stderr to a logging service before said service is even up requires
some <a href="#log">tricky fifo handling</a> that execline can do
and the shell cannot.
</p>

<a name="stage3">
<h2> How to design a stage 3-4 init </h2>
</a>

<p>
 If you're using s6-svscan as stage 2 init on <tt>/run/service</tt>, then
stage 3 init is naturally the <tt>/run/service/.s6-svscan/finish</tt> program.
Of course, <tt>/run/service/.s6-svscan/finish</tt> can be a symbolic link
to anything else; just make sure it points to something in the root
filesystem (unless your program is an execline script, in which case
it is not even necessary).
</p>

<h3> What stage 3-4 init must do </h3>

<ul>
 <li> Destroy the supervision tree and stop all services </li>
 <li> Kill all processes <em>save itself</em>, first gently, then harshly, and <em>reap all the zombies</em>. </li>
 <li> Up until that point we were in stage 3; now we're in stage 4. </li>
 <li> Unmount all the filesystems </li>
 <li> Halt or reboot the machine, depending on what root asked for </li>
</ul>

<p>
 This is seemingly very simple, even simpler than stage 1, but experience
shows that it's trickier than it looks.
</p>

<p>
 One tricky part is the <tt>kill -9 -1</tt> operation at the end of
stage 3: you must make sure that <em>process 1</em> regains control and keeps
running after it, because it will be the only process left alive. If you
are running a stage 3 script as process 1, it is almost automatic: your
script survives the kill and continues running, up into stage 4. If you
are using another model, the behaviour becomes system-dependent: your
script may or may not survive the kill, so on systems where it does not,
you will have to design a way to regain control in order to accomplish
stage 4 tasks.
</p>

<p>
 Another tricky part, that is only apparent with practice, is solidity.
It is even more vital that <em>nothing fails</em> during stages 3 and 4
than it is in stage 1, because in stage 1, the worst that can happen is
that the machine does not boot, whereas in stages 3 and 4, the worst that
can happen is that the machine <em>does not shut down</em>, and that is
a much bigger issue.
</p>

<p>
 For these reasons, I now recommend <em>not</em> tearing down the
supervision tree for stages 3-4. It is easier to work in a stable
environment, as a regular process, than it is to manage a whole shutdown
sequence as pid 1: the presence of s6-svscan as pid 1, and of a working
supervision tree, is a pillar you can rely on, and with experience I find
it a good idea to keep the supervision infrastructure running until the end.
Of course, that requires the scandir, and the active supervision directories,
to be on a RAM filesystem such as <tt>tmpfs</tt>; that is good policy
anyway.
</p>

<h3> Is it possible to write stage 3 init in a scripting language? </h3>

<p>
 Yes, definitely, just like stage 1.
</p>

<p>
 However, you really should leave <tt>/run/service/.s6-svscan/finish</tt>
(and the other scripts in <tt>/run/service/.s6-svscan</tt>) alone, and
write your shutdown sequence without dismantling the supervision tree.
You will still have to stop most of the services, but s6-svscan should
stay. For a more in-depth study of what to do in stages 3-4 and how
to do it, you can look at the source of <tt>s6-linux-init-shutdownd</tt>
in the <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
package.
</p>


<a name="log">
<h2> How to log the supervision tree's messages </h2>
</a>

<p>
 When the Unix kernel launches your (stage 1) init process, it does it
with descriptors 0, 1 and 2 open and reading from or writing to
<tt>/dev/console</tt>. This is okay for the early boot: you actually
want early error messages to be displayed to the system console. But
this is not okay for stage 2: the system console should only be used
to display extremely serious error messages such as kernel errors, or
errors from the logging system itself; everything else should be
handled by the logging system, following the
<a href="s6-log.html#loggingchain">logging chain</a> mechanism. The
supervision tree's messages should go to the catch-all logger instead
of the system console. (And the console should never be read, so no
program should run with <tt>/dev/console</tt> as stdin, but this is easy
enough to fix: s6-svscan will be started with stdin redirected from
<tt>/dev/null</tt>.)
</p>

<p>
 The catch-all logger is a service, and we want <em>every</em>
service to run under the supervision tree. Chicken and egg problem:
before starting s6-svscan, we must redirect s6-svscan's output to
the input of a program that will only be started once s6-svscan is
running and can start services.
</p>

<p>
 There are several solutions to this problem, but the simplest one is
to use a FIFO, a.k.a. named pipe. s6-svscan's stdout and stderr can
be redirected to a named pipe before s6-svscan is run, and the
catch-all logger service can be made to read from this named pipe.
Only two minor problems remain:
</p>

<ul>
 <li> If s6-svscan or s6-supervise writes to the FIFO before there is
a reader, i.e. before the catch-all logging service is started, the
write will fail (and a SIGPIPE will be emitted). This is not a real issue
for an s6 installation because s6-svscan and s6-supervise ignore SIGPIPE,
and they only write
to their stderr if an error occurs; and if an error occurs before they are
able to start the catch-all logger, this means that the system is seriously
damaged (as if an error occurs during stage 1) and the only solution is
to reboot with <tt>init=/bin/sh</tt> anyway. </li>
 <li> Normal Unix semantics <em>do not allow</em> a writer to open a
FIFO before there is a reader: if there is no reader when the FIFO is
opened for writing, the <tt>open()</tt> system call <em>blocks</em>
until a reader appears. This is obviously not what we want: we want
to be able to <em>actually start</em> s6-svscan with its stdout and
stderr pointing to the logging FIFO, even without a reader process,
and we want it to run normally so it can start the logging service
that will provide such a reader process. </li>
</ul>

<p>
 This second point cannot be solved in a shell script, and that is why
you are discouraged to write your stage 1 init script in the shell
language: you cannot properly set up a FIFO output for s6-svscan without
resorting to horrible and unreliable hacks involving a temporary background
FIFO reader process.
</p>

<p>
 Instead, you are encouraged to use the
<a href="//skarnet.org/software/execline/">execline</a> language -
or, at least,
the <a href="//skarnet.org/software/execline/redirfd.html">redirfd</a>
command, which is part of the execline distribution. The
<a href="//skarnet.org/software/execline/redirfd.html">redirfd</a>
command does just the right amount of trickery with FIFOs for you to be
able to properly redirect process 1's stdout and stderr to the logging FIFO
without blocking: <tt>redirfd -w 1 /run/service/s6-svscan-log/fifo</tt> blocks
if there's no process reading on <tt>/run/service/s6-svscan-log/fifo</tt>, but
<tt>redirfd -wnb 1 /run/service/s6-svscan-log/fifo</tt> <em>does not</em>.
</p>

<p>
 This trick with FIFOs can even be used to avoid potential race conditions
in the one-time initialization script that runs in stage 2. If forked from
init-stage1 right before executing s6-svscan, depending on the scheduler
mood, this script may actually run a long way before s6-svscan is actually
executed and running the initial services - and may do dangerous things,
such as writing messages to the logging FIFO before there's a reader, and
eating a SIGPIPE and dying without completing the initialization. To avoid
that and be sure that s6-svscan really runs and initial services are really
started before the stage 2 init script is allowed to continue, it is possible
to redirect the child script's output (stdout and/or stderr) <em>once again</em>
to the logging FIFO, but in the normal way without redirfd trickery,  before
it execs into the init-stage2 script. So, the child process blocks on the
FIFO until a reader appears, while process 1 - which does not block - execs
into s6-svscan and starts the logging service, which then opens the logging
FIFO for reading and unblocks the child process, which then runs the
initialization tasks with the guarantee that s6-svscan is running.
</p>

<p>
 It really is simpler than it sounds. :-)
</p>

<h2> A working example </h2>

<p>
 This whole page may sound very theoretical, dry, wordy, and hard to
grasp without a live example to try things on; unfortunately, s6 cannot provide
live examples without becoming system-specific.
</p>

<p>
 However, the
<a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
package provides you with the
<a href="//skarnet.org/software/s6-linux-init/s6-linux-init-maker.html">s6-linux-init-maker</a>
command, which produces a set of working scripts, including a script
that is suitable as <tt>/sbin/init</tt>, for you to study and edit.
You can <em>run</em> the <tt>s6-linux-init-maker</tt> command even
on non-Linux systems: it will produce scripts that do not work as
is for another OS, but can still be used for study and as a basis for
a working stage 1 script.
</p>

</body>
</html>
